Try Textual Inversion - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

Try Textual Inversion

https://gyazo.com/461f9aa8415fdc0e45e20a4eb9d42dec

Using 5 photos of our cat as training data, recalculate the embedded vector of the new token using Textual Inversion, and use the new token as a prompt to generate an image using Stable Diffusion.

learning data

https://gyazo.com/760f4eb8f518290288ed0a87b9b26e1a

AI-generated photo, AI-generated Monet-style painting

https://gyazo.com/ffa21fa2d8ab09a2a76d41459efcbef9https://gyazo.com/29787da6b19012fc721907eeb6f3a285

By the way, the prompt is something like "a photo of our cat" or "a painting of our cat by Claude Monet", but if you change the "our cat" part to "cat", you will see the following. It is more complete as a cat, but the characteristics of "our cat" are not so good.

https://gyazo.com/a280b7afa562217920b6b3e6427b1a0dhttps://gyazo.com/960a4befc7d6f2f8c0d4ca6f3fbebc76

Maybe my cat is a type of structure in the three coat colors of black, orange, and white, where the black pigment is lost and the orange is much lighter.

https://gyazo.com/73cf1b06b231de0dcb32f31cba3ef74ehttps://gyazo.com/29bbe5396e9c4a16f1ac0a765b4af8e4

The file of embedded vectors generated by Textual Inversion is about 5KB. The main content is a 768-dimensional float vector with some detailed information about the token.

---Impressions

@nishio: I'm feeling "not much resemblance" at the moment, but compared to random cat photos, it has clearly acquired features, so I have a feeling that within a few years there will be a lot of [People who keep messing around in search of a face. I have a feeling that there will be many [People who keep messing around in search of a face.

For example, if you study the photos of your daughter who died prematurely and generate hundreds of photos every day and select the ones you like, you will create new photos of your "[It lives in my heart. [Commemorative photo at a sightseeing spot you've never been to, field day photos, wedding photos...

This is a virtual souvenir photo of my cat, a completely indoor cat, when I took her to the virtual ocean!

https://gyazo.com/f71c768dc578e7a8e678af69f9b76bc6

"Wedding Photography."

Ah, so you could generate "your idea of an ideal son-in-law", match them up, marry them, and then start generating pictures of "grandchildren" who never existed...

This "virtual reality" sounds like a bad idea. If there is demand, there will be providers, and the tragedy of losing virtual grandchildren when the providers go out of business...

I didn't really understand the market for metaverse, which is about creating Avatars that look like real people from photos, but I guess it will develop into "Metaverse as a world where dead people continue to live"... I guess it's evolving into "Metaverse as a world where dead people continue to live"...

His daughter, who died young and came of age in the Metaverse, is locked in by Meta (hell).

My virtual daughter and son-in-law are raising their non-existent grandchildren in a beautiful non-existent house by a non-existent lake while subsisting on a non-existent farm, all locked in by Meta, and the maintenance fees are deducted from my account on a subscription model. When I thought I hadn't logged in recently, the person was dead, but I hadn't cancelled the account, so it keeps getting debited (hell).

I saw the response of "learning a guesser's photo" and thought that there could be hell even if the subject is still alive. It seems like there is a large amount of training data and it would be easy to improve the quality of the face. The person is growing up, but the growth is stopped when the person says, "No, I like the one I had when I was 20 years old" and is kept forever in the metaverse.

Breeding [idol

There are going to be hundreds of people who will remove the porn filter.

Is it possible to do that with a realistic number of images for ordinary people who are not idols or anything else? Is there a possibility of creating a culture of not taking off the mask in front of anyone but trusted people, or covering the face and not showing it to anyone but family members?

---

Bowman

https://scrapbox.io/files/6323fdeeff937700225f1963.png

https://scrapbox.io/files/6323fdf1ff25a80021c2e9e2.png

I got a very good one! I was so excited, but this was the best case, and even after generating more than 100 sheets after that, I could not produce anything better than this!

https://gyazo.com/d3467678eae4ca379c5af5118ec9128a

Interpreted as "Bowmen usually have a local dish." w

There are too many outputs where the food is the main body. You said "it's a CHARACTER" when you were learning.

In fact, this was the first experiment, and after I got excited and left it for a while, I decided "let's try it in live action", which is the cat experiment above.

live-action Bowman

https://gyazo.com/33ea488b2647313b64c54627f439a519

https://gyazo.com/bd735fcae08827967d2e32798ea3075a

It's technically interesting that they have mastered various things such as "texture", "colors they tend to use", "CO-like logo", "size against people", etc. from the images I gave them with no prior information... but I guess consumers won't be satisfied with this quality, right?

Results can be seed sensititve. If you're unsatisfied with the model, try re-inverting with a new seed (by adding --seed <#> to the prompt).

You may or may not get a good one if you run the gacha 100 times for an hour at a time.

Since what we get as a result of learning is a single vector of 768 dimensions, we may be able to search efficiently by selecting only the good ones among multiple vectors and averaging or GA

Optimization problem in 768 dimensional space where the evaluation function is human after all.

My cat's learning will be at a satisfactory level if I work hard, and I have a feeling that Bowman won't be able to do it.

Bowman can't be represented by one token, he would need to be represented by about three tokens, for example, a face, a logo, and an outfit.

Learning with Unexplored Logos

https://scrapbox.io/files/632401548ada340022341544.png

https://scrapbox.io/files/63240156cb72b60022b12ec9.jpg

I tried changing the background to show the logo image part, but it still doesn't seem to work.

I guess you'd have to take a picture of a logo shaped 3D object placed in various locations.

It sounds like they understood it as "an abstract image with greenish, diagonal and horizontal lines" rather than "an unexplored logo."

---

This page is auto-translated from /nishio/Textual Inversionを試してみる using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.